knitr document van Steensel lab

TF reporter cDNA-count processing - stimulation 2

Introduction

I previously processed the raw sequencing data, optimized the barcode clustering, quantified the pDNA data and normalized the cDNA data. In this script, I want to have a detailed look at the cDNA data from a general perspective.

Analysis

First insights into data distribution - reporter activity distribution plots

Heat map - display mean log2-activity for each TF in each condition

Heatmap for native enhancers

# motfn=/home/f.comoglio/mydata/Annotations/TFDB/Curated_Natoli/update_2017/20170320_pwms_selected.meme
# odir=/home/m.trauernicht/mydata/projects/tf_activity_reporter/data/SuRE_TF_1/results/native-enhancer/fimo
# query=/home/m.trauernicht/mydata/projects/tf_activity_reporter/data/SuRE_TF_1/results/native-enhancer/cDNA_df_native.fasta

# nice -n 19 fimo --no-qvalue --thresh 1e-4 --verbosity 1 --o $odir $motfn $query 

Heatmap per TF - comparing design activities mutated vs. non-mutated

Heatmap per TF - only WT TF activities

Compute activity changes relative to their negative controls

All of these heatmaps conclude that there we have informative reporters for ~10 TFs, and that the TF reporter design matters for some but not all TFs

Log-linear expression modelling to explain variance - model for each TF

Log-linear expression modelling to explain variance - model for each TF - only WT - without condition

Make the same models as before - but now per TF and per condition

Session Info

paste("Run time: ",format(Sys.time()-StartTime))
## [1] "Run time:  3.537715 mins"
getwd()
## [1] "/DATA/usr/m.trauernicht/projects/SuRE-TF/gen-1_stimulation-2"
date()
## [1] "Wed Dec  9 12:43:37 2020"
sessionInfo()
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.7 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] tidyr_1.0.0        stringr_1.4.0      readr_1.3.1        GGally_1.5.0      
##  [5] gridExtra_2.3      cowplot_1.0.0      plyr_1.8.6         viridis_0.5.1     
##  [9] viridisLite_0.3.0  ggforce_0.3.1      ggbeeswarm_0.6.0   ggpubr_0.2.5      
## [13] magrittr_1.5       pheatmap_1.0.12    tibble_3.0.1       maditr_0.6.3      
## [17] dplyr_0.8.5        ggplot2_3.3.0      RColorBrewer_1.1-2
## 
## loaded via a namespace (and not attached):
##  [1] prettydoc_0.4.0   beeswarm_0.2.3    tidyselect_1.1.0  xfun_0.19        
##  [5] purrr_0.3.3       lattice_0.20-38   splines_3.6.3     colorspace_1.4-1 
##  [9] vctrs_0.2.4       htmltools_0.5.0   mgcv_1.8-31       yaml_2.2.1       
## [13] rlang_0.4.8       pillar_1.4.3      glue_1.4.2        withr_2.1.2      
## [17] tweenr_1.0.1      lifecycle_0.2.0   munsell_0.5.0     ggsignif_0.6.0   
## [21] gtable_0.3.0      evaluate_0.14     labeling_0.3      knitr_1.30       
## [25] vipor_0.4.5       Rcpp_1.0.5        scales_1.1.0      farver_2.0.1     
## [29] hms_0.5.3         digest_0.6.27     stringi_1.5.3     polyclip_1.10-0  
## [33] grid_3.6.3        tools_3.6.3       crayon_1.3.4      pkgconfig_2.0.3  
## [37] Matrix_1.2-18     ellipsis_0.3.0    MASS_7.3-51.5     data.table_1.12.8
## [41] assertthat_0.2.1  rmarkdown_2.5     reshape_0.8.8     R6_2.5.0         
## [45] nlme_3.1-143      compiler_3.6.3